1,087 research outputs found

    Entity Query Feature Expansion Using Knowledge Base Links

    Get PDF
    Recent advances in automatic entity linking and knowledge base construction have resulted in entity annotations for document and query collections. For example, annotations of entities from large general purpose knowledge bases, such as Freebase and the Google Knowledge Graph. Understanding how to leverage these entity annotations of text to improve ad hoc document retrieval is an open research area. Query expansion is a commonly used technique to improve retrieval effectiveness. Most previous query expansion approaches focus on text, mainly using unigram concepts. In this paper, we propose a new technique, called entity query feature expansion (EQFE) which enriches the query with features from entities and their links to knowledge bases, including structured attributes and text. We experiment using both explicit query entity annotations and latent entities. We evaluate our technique on TREC text collections automatically annotated with knowledge base entity links, including the Google Freebase Annotations (FACC1) data. We find that entity-based feature expansion results in significant improvements in retrieval effectiveness over state-of-the-art text expansion approaches

    Showing the scars

    Get PDF
    This short presentation examines instances of literary hypertexts intentionally stripped of that which makes them interconnected and updatable. To investigate aspects of how and why text creators, users, and intermediaries de-enhance hypertexts for reasons entirely distinct from the much-studied antipathy to hypertextuality found in some 20th century literary cultures, it contrasts one commercial and one non-commercial (indeed, actively anti-commercial) example: the mass phenomenon of Kindle Direct Publishing and the niche practice of fan binding. Fan bindings, where fanfiction and other fan works are printed and bound as material objects, sometimes using Print on Demand (POD) services but more often by hand, circulate in a gift economy with distinctive ethical norms and, as transformative works in their own right, illustrate how meaning is made as well as lost in uncoupling works from their fan community contexts. Juxtaposing these examples problematises conceptions of either commercial self-publishing or non-commercial fan communities as offering uncomplicated refuge for interactive literature, and challenges narratives of literary communities as en-duringly hostile to or no longer interested in experimentation with hypertextuality. The presentation addresses the conference topics of authorship and reading practices from a book history perspective, highlighting the wider significance of stances against hypertextuality and implications for hypertext creators and audiences across genres

    Content-Based Weak Supervision for Ad-Hoc Re-Ranking

    Full text link
    One challenge with neural ranking is the need for a large amount of manually-labeled relevance judgments for training. In contrast with prior work, we examine the use of weak supervision sources for training that yield pseudo query-document pairs that already exhibit relevance (e.g., newswire headline-content pairs and encyclopedic heading-paragraph pairs). We also propose filtering techniques to eliminate training samples that are too far out of domain using two techniques: a heuristic-based approach and novel supervised filter that re-purposes a neural ranker. Using several leading neural ranking architectures and multiple weak supervision datasets, we show that these sources of training pairs are effective on their own (outperforming prior weak supervision techniques), and that filtering can further improve performance.Comment: SIGIR 2019 (short paper

    Knowledge-rich Image Gist Understanding Beyond Literal Meaning

    Full text link
    We investigate the problem of understanding the message (gist) conveyed by images and their captions as found, for instance, on websites or news articles. To this end, we propose a methodology to capture the meaning of image-caption pairs on the basis of large amounts of machine-readable knowledge that has previously been shown to be highly effective for text understanding. Our method identifies the connotation of objects beyond their denotation: where most approaches to image understanding focus on the denotation of objects, i.e., their literal meaning, our work addresses the identification of connotations, i.e., iconic meanings of objects, to understand the message of images. We view image understanding as the task of representing an image-caption pair on the basis of a wide-coverage vocabulary of concepts such as the one provided by Wikipedia, and cast gist detection as a concept-ranking problem with image-caption pairs as queries. To enable a thorough investigation of the problem of gist understanding, we produce a gold standard of over 300 image-caption pairs and over 8,000 gist annotations covering a wide variety of topics at different levels of abstraction. We use this dataset to experimentally benchmark the contribution of signals from heterogeneous sources, namely image and text. The best result with a Mean Average Precision (MAP) of 0.69 indicate that by combining both dimensions we are able to better understand the meaning of our image-caption pairs than when using language or vision information alone. We test the robustness of our gist detection approach when receiving automatically generated input, i.e., using automatically generated image tags or generated captions, and prove the feasibility of an end-to-end automated process

    Retrieve-Cluster-Summarize: An Alternative to End-to-End Training for Query-specific Article Generation

    Full text link
    Query-specific article generation is the task of, given a search query, generate a single article that gives an overview of the topic. We envision such articles as an alternative to presenting a ranking of search results. While generative Large Language Models (LLMs) like chatGPT also address this task, they are known to hallucinate new information, their models are secret, hard to analyze and control. Some generative LLMs provide supporting references, yet these are often unrelated to the generated content. As an alternative, we propose to study article generation systems that integrate document retrieval, query-specific clustering, and summarization. By design, such models can provide actual citations as provenance for their generated text. In particular, we contribute an evaluation framework that allows to separately trains and evaluate each of these three components before combining them into one system. We experimentally demonstrate that a system comprised of the best-performing individual components also obtains the best F-1 overall system quality.Comment: 5 pages, 1 figure

    How Intercultural is an “Intercultural University”? Some lessons from Veracruz, Mexico

    Get PDF
    Since the beginning of the 21st century, a new institutional figure starts to appear in the arena of Mexican higher education: the socalled “intercultural university”. What was first presented and conceived just as another link in the chain of preschool, primary and increasingly also post-primary schools “with an intercultural and bilingual approach”, created in and for the indigenous and multilingual regions of Mexico, now starts to have characteristics of a new uni- versity subsystem destined to provide an academic training which is supposed to be culturally relevant to students who are defined as diverse and different in ethnic, linguistic and/or cultural terms. In practice, this new educational offer is focused on students from indigenous regions who have been excluded from formal higher education and have had access only recently to complete basic education and also gradual access to upper secondary education. In this contribution, we briefly sketch the general tendencies that characterize this emerg- ing educational subsystem, before illustrating a case study which stems from a collaborative ethnography that we are conducting with one of the intercultural universities, the Universidad Veracruzana Intercultural (UVI), in order to finally draw some conclusions on the allegedly “intercultural” character of this new educational institution. Key words: intercultural education; intercultural university; collaborative ethnography; Veracruz

    Entity relatedness for retrospective analyses of global events

    Get PDF
    Tracking global events through time would ease many diachronic analyses which are currently carried out manually by social scientists. While entity linking algorithms can be adapted to track events that go by a common name, such a name is often not established in early stages leading up to the event. This study evaluates the utility of entity relatedness for the task of identifying related entities and textual resources that describe the involvement of the entity in the event. In a small study we find that simple relatedness methods obtain MAP score of 0.74 outperforming many advanced baseline systems such as Stics and Wiki2Vec. A small adaptation of this method provides sufficient explanations of entity involvement or 68% of relevant entities

    Local and global query expansion for hierarchical complex topics

    Get PDF
    In this work we study local and global methods for query expansion for multifaceted complex topics. We study word-based and entity-based expansion methods and extend these approaches to complex topics using fine-grained expansion on different elements of the hierarchical query structure. For a source of hierarchical complex topics we use the TREC Complex Answer Retrieval (CAR) benchmark data collection. We find that leveraging the hierarchical topic structure is needed for both local and global expansion methods to be effective. Further, the results demonstrate that entity-based expansion methods show significant gains over word-based models alone, with local feedback providing the largest improvement. The results on the CAR paragraph retrieval task demonstrate that expansion models that incorporate both the hierarchical query structure and entity-based expansion result in a greater than 20% improvement over word-based expansion approaches

    Inferring functional modules of protein families with probabilistic topic models

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome and metagenome studies have identified thousands of protein families whose functions are poorly understood and for which techniques for functional characterization provide only partial information. For such proteins, the genome context can give further information about their functional context.</p> <p>Results</p> <p>We describe a Bayesian method, based on a probabilistic topic model, which directly identifies functional modules of protein families. The method explores the co-occurrence patterns of protein families across a collection of sequence samples to infer a probabilistic model of arbitrarily-sized functional modules.</p> <p>Conclusions</p> <p>We show that our method identifies protein modules - some of which correspond to well-known biological processes - that are tightly interconnected with known functional interactions and are different from the interactions identified by pairwise co-occurrence. The modules are not specific to any given organism and may combine different realizations of a protein complex or pathway within different taxa.</p
    corecore